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To evaluate the nature of the neural code in the cerebral cortex, we have used a combination of theory and 
experiment to assess how information is represented in a realistic cortical population response. We have shown 
how a sensory stimulus could be estimated on a biologically -realistic time scale, given brief individual responses 
from a population of neurons with similar response properties. For neurons in extrastriate motion area MT, a 
combinatorial code, one that keeps track of the cell identity of action potentials and silences in individual neurons 
across the population, carries twice as much information about visual motion as does spike count averaged over 
the same group of cells. The combinatorial code is more informative because of the diverse firing rate dynamics 
of MT neurons in response to constant motion stimuli, and is robust to neuron-neuron correlations. We provide 
a theoretical motivation for these observations that challenges commonly held ideas about the nature of cortical 
coding at the level of single neurons and neural populations. 



Our understanding of sensory representations 
in the cerebral cortex is built on two fundamental 
ideas, each of which emerged to some degree from 
the study of simpler systems. First is the concept of 
rate coding, that neurons respond to sensory inputs 
by changing the rate at which they generate action 
potentials or spikes 1 " 3 . Second is the idea of feature 
selectivity, that neurons respond not to raw stimulus 
variables such as light intensity but rather to specific 
features such as spatial gradients and their orienta- 
tion or motion 4 " 7 . Many neurons in a small neigh- 
borhood of the cortex seem to have very similar fea- 
ture selectivity 8 ' 9 , suggesting that averaging over this 
apparently redundant population is an essential com- 
ponent of the cortical code. 

There is no question that neurons respond to 
sensory inputs by changing the rate at which they gen- 
erate action potentials. However, the fact that neurons 
modulate their spike rate in response to sensory stim- 
uli is a statement about their average behavior in ex- 
periments where the same stimulus is repeated many 
times and responses are averaged to create the peri- 
stimulus time histogram, or PSTH. The brain itself, 
though, has no way of computing the average rate of 
an individual neuron, and all decisions must be based 



on single examples of the spike trains, albeit from a 
populaton of cells. Furthermore, many behaviors are 
guided by sensory information available in small time 
windows, so that each neuron can contribute only a 
handful of action potentials 1011 . This raises the criti- 
cal issue of how the nervous system extracts informa- 
tion from such a small number of events. 

While "rate coding" is viewed as well estab- 
lished, codes based on the timing of spikes, whether in 
sequence from a single neuron or across a population, 
have been viewed as more speculative, except in spe- 
cial cases. In particular, the fact that neurons in cortex 
generate spike trains that are approximately described 
by a modulated Poisson process means that the (time 
varying) rate provides a nearly complete description 
of the distribution out of which spikes are drawn, and 
this has been taken as prima facie evidence against a 
timing code. We will argue that this informal infer- 
ence from the statistics of spike trains to the structure 
of the neural code is incorrect, and we will reformu- 
late the coding problem. Rather than seeing the issue 
as "rate codes" vs. "timing codes," we suggest that 
one can ask directly about the nature of the symbols 
that carry sensory information. 
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Our paper begins by showing analytically that 
once rates vary as a function of time, the best esti- 
mate of rate from a single example of the spike train 
depends on the detailed timing of spikes. This ef- 
fect is clear in experiments conducted in visual area 
MT as well, highlighting the limitations of the usual 
"rate vs. timing" formulation of the coding problem. 
We then demonstrate the analogous effect in popu- 
lations of neurons, showing that diversity in the dy- 
namics of responses — even among neurons with nom- 
inally identical feature seelctivity — opens the possi- 
bility of a combinatorial code 10 ' 12 ' 13 in which stimu- 
lus features are represented in patterns of spiking and 
silence across a population of MT neurons. Even in 
populations of modest size (N ~ 20 cells), these pat- 
terns provide more than twice the amount of informa- 
tion about the stimulus than is available from pool- 
ing the spike counts. While we do not know if the 
cortex actually uses additional information in patterns 
of spikes and silence across the population, we show 
that the additional information does not require any 
unusual properties of the neural spiking statistics, and 
thus could exist in almost any population of cortical 
neurons. 

Results 

Relationship of firing rate and spike timing for Pois- 
son neurons 

The time course of the firing rate of a neuron 
can be estimated by accumulating a peri-stimulus time 
histogram (PSTH) across multiple responses to the 
same stimulus or motor response. In reality, however, 
the nervous system does not have the opportunity to 
estimate the underlying rate of a neuron's response by 
averaging across multiple nearly-identical behavioral 
epochs. If it does estimate the time-modulated rate of 
a neuron, r(i), then it must do so on the basis of one 
sequence of all or nothing events at specified times. 

To see how this would work, consider a neu- 
ral spike train that is a modulated Poisson process and 
assume that we observe the spike train in a window 
of time < t < T. For a Poisson neuron with 
rate r(t), the probability density for spikes to occur 



at times t\, t 2 , ■ ■ ■ , t n in the window is given by 



P[t 1 ,t 2 ,...t n \r(t)] 
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r(h)r(t 2 )...r(t n ). (1) 



To estimate the rate r(t) from observations on the 
spike train, we use Bayes' rule to construct the proba- 
bility distribution of rates given our observations: 



P[r(t)\t 1 ,t 2 ,...t n ] 



P[t 1 ,t 2 ,...t n \r(t)]P[r(t)] 
P[tl,t 2 ,---tn] 



(2) 

where P[r(t)] is the probability distribution for the 
rates r(i) accessed by the neuron over its dynamic 
range of responses, and P[ti,t 2 , . . .t n ] is the total 
probability of observing this sequence of spikes, av- 
eraged over stimuli. If f = y J dtr(t) is the average 
of this rate over the whole window T, then some aleg- 
bra reveals that: 

P(f\ti,t 2 , ...t n )<x exp(-Tf)(r(ti)r(t 2 ) • ■ ■ r(t n ))f 

(3) 

where (. . .) f - denotes an average over all the functions 
r(t) used by the neuron that have the same average 
value r. The important aspect of Equation [3] is that 
the timing of the individual spikes has not disappeared 
from the result; to estimate the underlying spike rate 
from a single response, we need to know the timing of 
the spikes. 

Figure 1 amplifies the significance of Equation 
[3] on the basis of recordings from our sample of 36 
MT neurons. For each neuron, we measured the un- 
derlying rate by averaging across many trials to create 
PSTHs with a time resolution of 2 ms. We then ana- 
lyzed each trial individually, taking all 32 ms windows 
in which there were n = 1, 2, . . . , 9 spikes and aver- 
aging the underlying rates associated with each spike 
count. In Figure la, we consider windows that con- 
tain n = 2 spikes, and ask whether the rate in this 
window, defined as an average over trials, was related 
consistently to the timing of the spikes. The variation 
in the color across the two dimensional map indicates 
that spike timing was related to underlying rate in a 
complex way. To ask whether the same complex rela- 
tionship would appear when the within-trial structure 
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Figure 1: Relationship between underlying rate and 
spike count for single neurons and neural populations, 
(a) Relationship between spike times and the underly- 
ing mean rate for 32 ms analysis windows that con- 
tain two spikes, based on the data from a single MT 
neuron. The times of the two spikes are indicated on 
the abscissa and ordinate, and the color code indicates 
mean rate, (b) Same as (a), but after the spikes in 
each time window had been shuffled across trials, (c) 
Conditional mean rate, the average of observed rates 
given a particular spike count in a 32 ms analysis in- 
terval, plotted as a function of the number of spikes in 
the window. Symbols show rate estimated from sin- 
gle neurons individually then averaged over our sam- 
ple of 36; gray ribbon shows rate estimated by count- 
ing spikes across single trial draws from each of the 
36 neurons (pooled count). Error bars on the one cell 
pool indicate standard deviations of the mean over the 
36 cells recorded. Dashed line shows the naive rate, 
the number of spikes indicated on the abscissa divided 
by the duration of the analysis interval, n/T. (d) In- 
formation about underlying rate from spike counts as 
a function of the number of cells in the population. 
Different curves show calulations based on different 
duration analysis intervals, indicated by numbers to 
the right of each curve. 



of spike timing was abolished, we randomized the trial 
identity of the spike train independently within each 2 
ms time window. The modulation of rate across the 
map in Figure lb shows that the relationship between 
spike timing and underlying rate persists, as predicted 
by Equation [3] even when we enforce the Poisson na- 
ture of the spike train and eliminate any correlations 
across time. Similar patterns appear when the same 
analysis is performed on data from Poisson model 
neurons with sinusoidal modulations of the underly- 
ing firing rate (data not shown). 

We emphasize that these results about the role 
of timing in the estimation of rate are contrary to a 
widely held intuition, namely that for Poisson pro- 
cesses counting spikes in a window provides the best 
estimate of the underlying rate. This is exactly cor- 
rect for constant rates, but once rates vary in time the 
estimation problem changes its structure. To explore 
this further, we look in more detail at the relationship 
between the spike count in a single trial and the un- 
derlying rate as estimated by the PSTH. Although it 
is well known that, especially when the counts are 
small, there will be significant random errors in es- 
timating the rate, Figure lc (open circles) shows that 
there are also large systematic errors. For almost all 
counts, the average rate in windows with a particu- 
lar spike count n falls far below the naively expected 
value of spike count divided by analysis window du- 
ration (n/T). When we pool spikes from across our 
full sample of 36 neurons (Fig. lc, filled circles), the 
rates are much closer to the "count per time" estimate, 
but only over a highly restricted dynamic range. 

To assess the cost of this reduced dynamic 
range, we ask directly how much information the 
spike counts provide about the underlying rates in our 
sample of data from MT. Increasing either the time 
window for counting or the number of neurons used 
in the analysis increased the information about rate 
from counts, as shown in Figure Id. But these po- 
tential gains are constrained by the time scales of be- 
havior: for the example of smooth pursuit eye move- 
ments, which are driven by the population responses 
in MT 14 , the time window of analysis of visual motion 
is approximately 25 ms 15 . In a 24-ms analysis inter- 
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val (Fig. Id), even pooling across 36 neurons allowed 
spike counts to provide much less than 2 bits of infor- 
mation about the underlying mean rate of the popula- 
tion, meaning that only 4 different values of rate can 
be distinguished perfectly. In contrast, the trajectory 
of the smooth pursuit behavior itself provides roughly 
12 bits of information about the parameters of target 
motion . While the brain probably pools over more 
than 36 neurons, these results certainly suggest that 
we should explore other possible coding schemes. 

Extra information about stimulus properties from pat- 
terns of spikes 

The coexistence of time-dependent Poisson fir- 
ing with the importance of spike timing in single neu- 
rons has an analog in populations of neurons. Thus, if 
we have a group of cells in which rates vary across the 
population, then the combinatorial patterns of spikes 
and silence in a given small window of time may 
provide extra information beyond that available from 
pooling and counting the total number of spikes. What 
is critical for thinking about this possibility in the cor- 
tex is that even neurons with similar feature selectivity 
can have very different responses dynamics, enabling 
combinatorial coding even in a population of nomi- 
nally redundant cells. 

To evaluate the possible utility of a combinato- 
rial code, we consider a population of MT neurons. 
Experimentally, we have observed many responses 
to each of a finite set of different stimuli 11 . Each 
of the cells in our sample was directionally tuned, 
with relatively similar selectivity and bandwidth when 
responses are normalized (Figure 2a), although the 
population had a wide range of maximal responses. 
Further, in response to a step of stimulus motion at 
the preferred speed and direction, different neurons 
showed considerable diversity in the dynamics of their 
firing rates (Figure 2b). If we assume that each neu- 
ron responds independently to its sensory inputs, then 
we can draw a single trial response from each neu- 
ron in our data base to create a model population re- 
sponse, even though the samples were recorded se- 
quentially from many different neurons. Pooling the 
draws in different ways creates many different hy- 
pothetical neural populations of different sizes (see 



Methods for details). We can then subject each draw 
to various analyses to evaluate the possible nature of 
the neural code in a population response. Finally we 
create correlated populations and evalute their impact 
on different coding schemes. 

If we look in a small window of time At, then 
the i th cell generates n\ spikes, with i = 1, 2, . . . , N. 
For small values of At, we will almost never see two 
spikes from a single cell. Thus, the response of the 
population {rij} can be treated as an iV-letter binary 
word, w (a pattern of l's and O's), as shown in Fig- 
ure 2c. By keeping track of the combinations of spik- 
ing and silence across the population, we can ask how 
much information these code words carry about the 
stimulus. At each instant of time, the stimulus in our 
experiments is specified by the direction of motion 
and the time, t — t onset , since the onset of motion, 
and calculations described in the Methods allow us to 
use the experimental data to estimate the information 
that the number or pattern of spikes provides about the 
stimulus, I(w; 6, t = t omet ). 

The results in Figure 2d demonstrate that the 
information provided by binary code words increases 
as a function of the number of cells that contribute 
to the word, exceeding one bit for a population of 
16 neurons. If we use the same draws from the ex- 
perimental data to estimate the information that the 
spike count n = Yl\Li n i provides about the stimu- 
lus, I(n; 9,t — t onset ), we find that the total amount 
of information from spike counts is smaller than the 
total information from words, and never exceeds 1 bit 
even when all neurons in the sample are pooled to ob- 
tain spike counts. The combinations of of spiking and 
silence in this model population provide more than 
twice as much information as the pooled spike counts, 
even though the cells we are pooling from have nom- 
inally redundant feature selectivity. 

To ascertain which feature of the neural re- 
sponse was responsible for the extra stimulus informa- 
tion available from words versus pooled spike counts, 
we next created a number of carefully contrived pop- 
ulations of 10 model Poisson units that preserved ei- 
ther the diversity of time varying firing rates or the 
diversity of direction tuning curves, or that eliminated 
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Figure 2: Utility of patterns of spiking and silence across a diverse population of MT neurons for providing 
information about a target motion stimulus, (a) The normalized tuning curves of four MT neurons, showing 
firing rate versus direction of motion. Data were normalized by the response to the preferred direction, relative 
to which all other directions are measured, (b) Responses of the same four neurons, plotting rate (from PSTH) 
versus time in response to a 256 ms step of target speed in the preferred direction. Firing rate curves have been 
offset horizontally to improve visibility, (c) Method for creating words to indicate spiking and silences across a 
population of neurons, in 8 ms windows. In this example, the population response at time "t" is characterized by 
the word: "01010". (d) The information that counts and words carry about the visual motion stimulus plotted as 
a function of the number of neurons in the analysis population. 



all diversity (see Methods). For each population, we 
then performed the same set of information calcula- 
tions that led to Figure 2d. For a population of model 
units that preserved the diversity of firing rate dy- 
namics, r(i), but forced all the neurons to have the 
same direction tuning curve (Fig. 3a, filled circles), 
the amount of extra information from words was the 
same as that for the draws from the experimentally 
observed spike trains of MT neurons (open circles). If 
we contrived each unit to have the same time-varying 
trajectory of firing rate r(t), but retained the diversity 
of directional tuning amplitudes, then about half of 
the extra information from words was lost (open tri- 
angles). The extra information that remains reflects 
the fact that tuning curve diversity imposes different 
time-averaged absolute firing rates across the popula- 
tion, even if the trajectory of the trial-averaged firing 
rate r(t) was the same for each model neuron. We 
note that similar results on this latter point were ob- 
tained by Shamir and Sompolinsky, examining the ef- 
fects of simulated heterogeneities in static tuning on 
population codes 17 . Finally, if we created populations 



of fully redundant model units with one uniform tra- 
jectory, r(t), and the same amplitude and width of di- 
rection tuning curve, then the extra information from 
words was lost, as expected (filled triangles). Analysis 
of information as a function of time revealed that the 
extra information from words was concentrated near 
the time of the onset transients of the neural response, 
where the diversity of response dynamics is greatest 
(data not shown). 

What does the extra information tell us? 

The results of of the previous section tell us how 
much information the patterns of spiking and silence 
can convey about the stimulus. The next step is to un- 
derstand what these patterns are telling us about the 
stimulus. To focus our attention on a manageable set 
of patterns, we first computed the information carried 
by words and counts for different total spike counts in 
populations of N = 2 ... 16 cells, drawn 100 times 
from our 36 cortical neurons. Figure 4a shows that 
most of the extra information carried by words vs. 
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Figure 3: Combinatorial coding is enabled by a diver- 
sity of response dynamics. Extra information from 
words versus counts is plotted as a function of the 
number of neurons in the analysis population. Dif- 
ferent symbols show the results from populations of 
real and model neurons with different features made 
redundant. Open circles, data drawn from actual sin- 
gle trial responses; filled circles, diversity of response 
dynamics in model neurons mimics that in the data 
but each neuron has been made to have the same time- 
averaged direction-dependent response amplitude (i.e. 
same tuning curve); open triangles, model neurons 
that have the same time-varying firing rate, but re- 
sponse amplitude varies as in the actual data; filled 
triangles, model neurons that have the same time vary- 
ing firing rate and direction-dependent response am- 
plitudes but are independent Poisson processes. 



counts comes from those words with relatively few 
spikes, that is from analysis windows when only a few 
of the neurons in the population emitted spikes and the 
rest were silent. Further, most words had zero, one, or 
two spikes and increasing numbers of spikes were pro- 
gressively less common (Figure 4b). Combining these 
two effects shows that the dominant term in the extra 
information provided by words typically comes from 
instances when only one neuron fired a spike. Even 
when the size of the population was increased to 16, 
most of the extra information still arose from words 
of only one or a few spikes (Fig. 4c). To understand 
what features of the stimulus are represented by dif- 
ferent binary words, we therefore focused on words 
with only one spike. 

Our next step was to construct the response 
conditional ensembles 18 , the distribution of stimuli 
that were associated with a particular neural response. 
We can think of these ensembles as "receptive fields" 
for the population response defined by the occurrence 
of a particular pattern of spiking and silences, and of 
the process used to create them as a population word 
variant of spike-triggered averaging. In the color maps 
of Figure 5, the color of each pixel shows the probabil- 
ity of a given direction of stimulus motion at a given 
time between motion onset and the time of the word 
in question. The responses were assembled across all 
stimulus motions and all times for a sample of 9 neu- 
rons. The occurrence of n = 1 spike in a population 
of N = 9 neurons is highly ambiguous in terms of the 
stimulus that elicited it, as seen in Figure 5a, where 
the red ring shows the wide range of stimulus direc- 
tions and times that had high probabilities for a count 
of 1 spike. 

The event that contains one spike from nine 
neurons is composed of nine possible binary words, 
from 10000000 through 000000001, in which each 
single neuron spikes and all others are silent. Fig- 
ure 5b shows that each binary word points to a differ- 
ent distribution of stimuli, and that each word actually 
represents a quite narrow range of stimulus directions 
and times from motion onset. Importantly, the binary 
words go a long way toward resolving the ambiguity 
between motion direction and motion onset time that 
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Figure 4: Added information from words analyzed separately for each spike count, (a) Comparison of informa- 
tion from patterns of spiking and silences versus counts as a function of the number of spikes in the analysis 
window. Information from words was calculated by averaging the information from words of a given count, (b) 
The probability of observing each given spike count in 100 populations of 10 MT neurons, (c) Information from 
words minus information from counts is plotted separately for each spike count and each number of neurons in 
the analysis population. Connected sets of symbols show data for all counts in a given population size. 



is present in the neural data analyzed for Figure 5a, 
but not in a behavior driven by the MT population re- 
sponse, namely smooth pursuit eye movements 19 . No- 
tice that if the neurons really were redundant, as one 
might have thought from their tuning curves, each of 
the events would have to point to the same distribution 
of stimuli, and each word would be associated with a 
distribution identical to that found by counting the to- 
tal number of spikes. The extra information in 1 -spike 
words versus counting 1 spike is a general property of 
our cortical population and a similar plot to Figure 5 
could be constructed for any group of cells. 

Figure 5 demonstrates that different patterns of 
activity across a population of MT neurons can repre- 
sent different stimuli, but not necessarily that the com- 
bination of spiking and silence is telling us anything 
that the spikes alone do not. We tested this directly 
(Fig. 6) by constructing a set of response conditional 
ensembles based on keeping track of the spike from a 
single cell and progressively discarding knowledge of 
silence in other cells. While the combination of spik- 



ing in neuron #1 and silence in the rest of the popu- 
lation (100000000, upper left color map) points to a 
specific, small area in the space of stimuli, specificity 
declines in the representation as we throw away the 
knowledge of silence in more and more cells. Finally, 
the occurrence of a spike in neuron #1 with no knowl- 
edge about the state of the other cells 
lower right color map) points to a large area with tens 
of degrees of uncertainty about motion direction and 
hundreds of milliseconds of uncertainty about the time 
of motion onset. In the example illustrated in Figure 6, 
it is striking that the most uncertain large blob in the 
lower-right panel of Figure 6 has almost no overlap 
with the original distribution of stimuli conditional on 
spiking in neuron #1 and silence in the others: combi- 
nations of spikes and silence not only carry more in- 
formation than spike counts alone, but they also stand 
for very different events in the sensory input. Synergy 
of spikes and silence was a common feature of our MT 
data, as observed previously in the retina 20 . For 10- 
cell groups, approximately 30% of all 1 -spike words 
have significant spike-silence synergy. The prevalence 
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Figure 5: Response conditional stimulus ensembles 
for binary code words corresponding to a spike count 
of n = 1 in a population of N = 9 neurons, (a) The 
distribution of directions of motion and delays from 
motion onset, P(9, t — t onset \n), given that the popula- 
tion of cells produced a total of one spike in window 
of size At = 8 ms. (b) The same analysis, but now 
performed separately for each combination of spiking 
and silence where one neuron emitted a spike and all 
the others were silent. The probabilities in (a) have 
been normalized so that the total probability in the 
square is unity, with red representing the highest and 
dark blue the lowest values. The distributions in the 
small panels are normalized so that the average of all 
nine small panels yields the distribution in (a). Graphs 
are based on analysis of draws from actual data in one 
group of 9 MT neurons. N = 9 was chosen to allow 
the 3x3 presentation. 
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Figure 6: Contributions of silences in other neurons to 
the distribution of stimuli conditional on a spike in one 
neuron. Each pixel indicates the probability of a given 
direction of target motion at a given time after the on- 
set of target motion, given a particular word of spiking 
and silences across the population of 9 MT neurons. 
The string above each color map indicates the word 
that was used to create each response conditional en- 
semble, where a "1" or "0" indicates the presence or 
absence of a spike in a neuron and a "*" indicates a 
wildcard so that an interval was included in the aver- 
age whether a spike was present or absent. Further 
analysis revealed that the 100000000 and 1 ******** 
words contains 0.71 and 0.44 bits of information, re- 
spectively, about the stimulus. 



of synergy increases with N: more than 60% of 16- 
cell, 1 -spike words are synergistic. 

Effect of neuron-neuron correlations 

So far, our discussion of the population re- 
sponses in MT has assumed that the cells respond in- 
dependently to sensory inputs. We ignored correla- 
tions between the responses of different neurons not 
just for simplicity, but also to give the classical model 
of averaging over multiple redundant cells the greatest 
chance to succeed. We found that the diversity of tem- 
poral dynamics in neuron responses makes a substan- 
tial change in the structure of the problem, opening 
the possibility for a form of combinatorial coding. We 
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now ask whether correlations among neural responses 
alter the utility of a combinatorial code. 

Suppose that we know the average correlation 
coefficient between pairs of cells in a neural popula- 
tion. We would like to construct model population 
responses that are consistent with this level of corre- 
lation, and of course also with the observed time de- 
pendent firing rates. There are many ways to construct 
correlated populations, some of which correspond to 
complicated patterns of correlation which will give an 
obvious advantage to combinatorial codes. To avoid 
this, we used the one parameter model described in 
the Methods to achieve a model population with a pre- 
defined mean level of correlations, but with a distribu- 
tion that is otherwise is as random as possible. The 
parameter in our model is a "coupling," J (see Equa- 
tion 11) that we varied systematically to control the 
average pairwise correlation, which we assessed for 
each model population. We then computed the statis- 
tics of the model population responses with different 
levels of mean correlation, and examined the informa- 
tion content of these responses, as before. 

In Figure 7, we illustrate the impact of corre- 
lations on the information encoded by populations of 
N = 10 neurons. As expected from prior work 21 " 27 , 
the information available from counting spikes is re- 
duced when we add positive correlations among cells 
because it increases the trial-by-trial variance of the 
spike counts we obtain by summing across the neu- 
rons in a population. In contrast, negative correlations 
reduce the count variance and enhance information 
transmission. For coding based on patterns of spiking 
and silence, small positive correlations also cause a 
slight drop in information that reverses as correlations 
become stronger, increasing the advantage of the com- 
binatorial code over the spike count code at high levels 
of correlation. Across correlation levels, the extra in- 
formation from a code based on words versus counts 
is greater than or approximately equal to that found 
in the independent population. Thus, our conclusions 
about the opportunities for combinatorial coding are 
robust across a wide range of correlation strengths, 
including those observed experimentally, which are 
usually in the range of 0.1 to 0.2 25 - 28 " 33 . We conclude 
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Figure 7: Impact of neuron-neuron correlations on 
coding based on population patterns of spikes and si- 
lence verus spike counts. The values on the x-axis 
are the calculated correlations between pairs of sepa- 
rately sampled units after setting up correlated popu- 
lations using Equations 11-13. Filled and open circles 
show information about the stimulus from counts ver- 
sus patterns of spiking and silence, respectively. Data 
are shown as means and standard deviations across 50 
groups of N = 10 cells drawn from our experimental 
sample of 36 neurons. 



that combinatorial codes neither require exotic corre- 
lations among neurons, nor are they disrupted by the 
modest levels of correlation consistent with available 
data. 

Discussion 

For many years, the discussion of neural cod- 
ing could be summarized along an axis that had two 
endpoints: "rate" vs. "timing" codes 34 " 36 . There is 
no question that neurons respond to sensory inputs by 
changing the rate at which they generate action po- 
tentials, and many therefore believe that rate coding 
is firmly established. In contrast, codes based on the 
timing of spikes, whether in sequence from a single 
neuron or across a population, have been viewed as 
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more speculative. Our goal in this paper has been to 
replace the "rate vs. timing" debate with a sharper set 
of questions about the nature of the neural code, and to 
suggest that these questions have surprising answers. 

Our argument begins with experimental support 
for a purely conceptual point: "rate," as it is defined 
experimentally, cannot be a symbol in a code. Sym- 
bols or code words must be observable directly in sin- 
gle trials, as with the letters or words in text. In con- 
trast, rate refers to the probability of generating spikes 
and hence is by definition an average property of spike 
trains over many trials. The point that rate cannot be a 
symbol in the code would be pedantic if neurons oper- 
ated in a regime where they generate large numbers of 
spikes before the rate has a chance to change signifi- 
cantly. In this limit, rate can be estimated by counting 
spikes so that a "spike count" code becomes a reason- 
able surrogate for a "rate code". But real neurons do 
not operate in this limit. For example, cells in MT pro- 
vide most of their information about motion direction 
with just a few spikes 11 , and estimates of the underly- 
ing rate based on counting spikes in reasonable time 
windows are both uninformative (Figure Id) and sys- 
tematically in error (Figure lc). These problems are 
not ameliorated by averaging across a population of 
neurons with similar direction selectivity (Figure lc). 

The second step in our argument is to real- 
ize that even precise statements about the statistics of 
spike trains do not yield unique conclusions about the 
symbolic structure of the code. Thus, even for neurons 
whose spike train statistics are completely described 
by their underlying time- varying rate, estimates of the 
underlying rate from single spike trains depend on the 
timing of individual spikes. The intuition that Poisson 
statistics imply a code based on spike counts is valid 
only in the limit where rates are nearly time indepen- 
dent. Again, this analytical result is strongly reflected 
in the operation of real neurons: for cells in MT, in the 
exact timing of spikes within a reasonable window is 
related to fluctuations of 25% in our best estimate of 
the underlying rate (Figures la and b). The fact that 
spike timing is critical for rate estimation in practice, 
as well as in principle, requires revision of the usual 
formulation of "rate vs. timing" arguments. 



The third step in our argument arises from re- 
alizing that the more important issue in neural cod- 
ing and decoding is not the patterns of spike timing 
in single neurons, but rather its parallel in the patterns 
of spikes and silence across a population of neurons. 
We found that patterns of spiking and silence across a 
neural population contain twice as much information 
about a sensory stimulus as does the spike count. The 
extra information arises from the diversity of dynami- 
cal response properties across a population of neurons 
that otherwise have very similar tuning curves (e.g., 
for the direction of motion in MT). Indeed, it is this di- 
versity of response dynamics that limits the effective- 
ness of simple averaging strategies, and, as such, cor- 
tical populations are only nominally redundant. Be- 
cause we are considering the information carried by 
patterns in a single small time window, these results 
are unaffected by correlations across time (Poisson vs. 
non-Poisson statistics), and we have checked that they 
are also robust against reasonable levels of correlation 
among pairs of cells. We refer to the code defined 
by population words as combinatorial, because it de- 
pends in a critical way on the combinations of spikes 
and silence across the neural population. Indeed, for 
particular code words, the combined response of the 
population carries more information than would be 
expected by adding up the information carried by the 
responses of the individual cells (Fig. 6). 

We have presented our arguments in the con- 
crete context provided by the coding of visual mo- 
tion in area MT of the primate cortex, but the results 
should be much more general. Certainly the theoreti- 
cal difficulties with the traditional formulation of the 
coding problem as rate vs. timing are completely gen- 
eral, as seen from Equation|3] The quantitative results 
on the magnitude of the extra information carried in a 
combinatorial code depend on the details of the neu- 
ral population we are considering, but we emphasize 
that there is nothing extreme about the population of 
cells we have analyzed in detail. While the presence 
of extra information in the combinatorial code does 
not mean that brain uses this information to guide be- 
havior, it is crucial that what might have seemed like 
an exotic coding scheme does not in fact depend upon 
the existence of unusual structures in the spike trains, 
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either of single neurons or of populations. Rather, the 
possibility of combinatorial coding is a previously un- 
appreciated consequence of well-known dynamic re- 
sponse properties of neural responses throughout the 
cortex. While these cells encode stimuli by changing 
their firing rates, the elementary symbols of this code 
are the combinations of individual spikes and silences 
across the population of cells. 

Methods 

Experimental methods 

Experimental data have been published 
previously 11 . To acquire these data, extracellular 
single-unit microeletrode recordings were made 
in 3 sufentanil-anesthestized, paralyzed monkeys 
(Macaca fasicularis) according to a protocol that had 
been approved in advance by the Institutional Animal 
Care and Use Committee at UCSF. Using random 
dot texture stimuli presented on a high-resolution 
analog oscilloscope display, we mapped receptive 
field location, determined the preferred direction 
and speed of the neuron under study, and sized the 
stimuli to maximally excite each neuron. The random 
dot texture was moved behind a stationary aperture, 
creating a moving stimulus at a fixed retinal location. 

Visual stimuli were presented in discrete tri- 
als. Each stimulus appeared and remained station- 
ary for 256 ms, then stepped to a constant velocity 
for 256 ms, and was again stationary for 256 ms. 
A brief pause separated successive trials, and direc- 
tions of motion were pseudorandomly interleaved. A 
typical experiment included 13 motion directions that 
spanned±90 degrees around the neuron's preferred 
direction in 15 degree increments. Each stimulus was 
presented up to 222 times. Spike times were recorded 
with 10 microsecond resolution. 

Constructing a model population 

From the independently recorded single unit re- 
sponses, we constructed a model of the population 
response to a motion stimulus. To create a popula- 
tion with nominally redundant feature selectivity, we 



aligned all cells by their preferred direction. Then, we 
resampled the rasters of individual cells at At = 8 ms 
resolution, labelling the occupancy of each time bin 
with a "1" if there had been one or more spikes in the 
time interval or a "0" if there had not. At this resolu- 
tion, multiple spikes in a single bin were infrequent, 
occurring in fewer than 10% of the spiking events we 
recorded. We then created binary population "words", 
defined as patterns of l's and 0's, at each time point 
during the response by randomly drawing the N let- 
ters of each word from the collection of stimulus rep- 
etitions from all N cells in our sample in the appro- 
priate bin. Each neuron in our sample corresponded 
to a fixed position in the word, and we could con- 
struct many different words by random draws from 
the many repetitions of each stimulus for each neu- 
ron. The probability of observing a particular word 
then was measured by estimating the frequency of oc- 
currence of that pattern of 1 's and 0's within the entire 
dataset. 

Estimating information 

To estimate the information carried by popula- 
tion words about the stimulus, we first computed the 
probability of observing particular iV-neuron words 
from our dataset P(n = {n;}), where i labels the neu- 
rons, over all time and for all motion directions. The 
total entropy of the words is given by: 

S[P(n)] = -^P(n)log 2 P(n) (4) 

n 

where is a label that indexes word identity. The prob- 
ability of observing a word for a particular stimulus, 
P(n\6, t), was estimated in a similar manner to P(n) 
but at particular time t relative to the onset of motion 
in direction 6. The entropy of the conditional distribu- 
tions is given by: 

S[P(n\0, *)] = -£ P(n|0, t) log 2 P(n\6, t). (5) 

n 

The average amount of information that words carry 
about the stimulus is given by the difference between 
the total entropy and the average noise entropy: 

W = S[P(n)} - (^-^ £ £ S[P(n\6, t)]j 

(6) 
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where T represents the total number of time bins 
in the response and 9 indexes the 13 motion direc- 
tions. In a similar way, we can compute the infor- 
mation from counts using the same P(n) as before, 
but collapsing over words with the same number of 
spikes, such that P(count = n) = X^( n )> wnere 
the sum runs over all words, n, with count equal to 
n = 2~^£i n i- Similarly, P(count = n\9,t) = 
^2 P(n\Q,t). With this P(count) in hand, we com- 
pute /counts in a completely analogous fashion to the 
calculation of I W ords- 

In those cases where we generate samples of 
population words directly from observed spike trains, 
all entropy estimates were corrected for finite sam- 
pling effects by taking multiple random samples of 
fractions of the dataset and then performing a linear 
extrapolation to infinite sample size 37 . Errors in es- 
timates were estimated by extrapolating the standard 
deviation of values computed from half the sample in 
the same manner. Because we ask only about words 
formed from responses in a single time bin, correla- 
tions between time bins (and hence the question of 
whether the neurons are exactly Poisson) are irrele- 
vant; as a test of our computations we created shuffled 
spike trains with exact Poisson statistics, and repro- 
duced all of our results. 

Calculations based on a model population sam- 
pled from the real data have a strong intuitive con- 
nection to experiment, motivating us to use the ap- 
proach outlined above to make estimates of informa- 
tion based on real data. However, we found that work- 
ing with real data was unsatisfactory in some ways; in- 
formation rates converge only for a very large number 
of samples, which becomes increasingly cumbersome 
as the number of neurons, N, exceeds 16. Because the 
statistics we wish to reproduce in this model popula- 
tion are just the time dependent, experimentally ob- 
served firing rates for each neuron, it was possible to 
do the same analysis after simply calculating P(word) 
using Equation 10 (see below). This approach works 
because the spike rate of each model neuron, at each 
moment of time, is determined by our experimental 
data with small error bars, and hence there are no free 
parameters in the construction of our model popula- 



tion. We checked that this approach yielded the same 
answers as the data-based approach for values of iV 
where the calculations were tractable. In those cases 
where we computed word probabilities directly from 
Equation [lOj we propagated the errors in measured 
firing rates to obtain errors in the derived information 
measures. 

To create the model populations with identical 
time varying firing rates or directional tuning curves 
used to generate the data in Figure 3, we used each 
neuron in turn as a template. For each group of 
10 model cells, we randomly chose another neuron 
from the population, using its tuning curve, f{9)* 
(the bar indicates a time average), and the shape of 
its temporal modulations in rate at the preferred ori- 
entation, r(t)* = r(t,6 = O)/f(0 = 0), to serve 
as a template for fixing the tuning or firing rate dy- 
namics of the group. To fix the tuning of the pop- 
ulation, we allowed each cell to retain its own r(t), 
but rescaled each curve by a constant factor which 
forced the cell's tuning curve to follow f{9)*, such 
that r(t,9) = [r(t,9)/r(9)} ■ r(9)*. To fix the firing 
rate dynamics, each cell retained its own directional 
tuning curve, but temporal dynamics were set by the 
template, r(t)*, so that r{t, 9) = r(t)*r(ff). 

Spike-silence synergy 

To measure the synergy between spikes and si- 
lences in our population words, we simply took the 
difference between the stimulus information that word 
captured and the sum of the information from each 
component spike and silence 20 ' 38 , 

Synergy = I({ n i}) ~ ^ 

i 

The stimulus information, /, is computed as in Bren- 
ner et al., 2000 38 , and is given by: 

/(stimulus) = i^ r dt^l 0g2 (^| y J (8) 

where r(t) is the modulation of the rate for a given 
event, the occurrence of a given work or a spike or 
silence from a particular cell. 

Constructing a correlated population 
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We continue to work in small time windows, of 
duration At, such that the response of each neuron 
i consists either of a spike (n = 1) or silence (n = 
0). Then if all cells respond independently, we can 
write the probability distribution for the population's 
response {m} at some moment of time t in the form: 



N 



P({ni}\t) = Hj 



(9) 



where qi(t) denotes the probability of a spike from the 
i th neuron at time t, and in the limit At — > we can 
identify the time dependent firing rate of each neuron 
r i(t) = Qi(t)/(^ T )- Equation [9] can be rewritten as: 



P({nt}\t) 



1 



Z{t) 



exp 



N 



(10) 



where Z(t) is a normalization constant and 4>i(t) = 
ln<ft(i). This form suggests that we can add correla- 
tions among neurons by adding an explicit term to the 
exponential that couples the responses of the different 
cells: 



P({ni}\t) - 
1 

exp 



Z(t) 



N 



J 



N 



4=1 



4=1 j^i 



(11) 



In the independent model, there are no correlations 
between the responses n; and raj once we know the 



stimulus, while Equation 11 predicts that there will 
be non-zero correlations; for small J, the strength of 
these correlations is proportional to J. In fact, Equa- 
tion 11 is the least structured, or maximum entropy 



model that generates some average level of correla- 
tions among all the pairs of cells . 

To produce a model population of neurons with 
an average pairwise coupling, J, which respects each 
cell's average firing rate as a function of time rj(i), we 



need to solve for the <pi(t)'s in Equation 11 subject to 
the constraints: 



J2n k (t)P(n\t) = r k AT 



(12) 



where the r k (t)\ are measured single cell firing rates, 
averaged over a small time window, At = 8 ms. 



Since the cells are not coupled in time, we can solve 
for the fields at each time point independently. We 
have an analytical solution for the fields with J = 
and we can proceed from this solution using perturba- 
tion theory, from which we obtain an equation relating 
small changes in to their effect on the fields, </>: 



Afa 




-i 



{n a np){nk) - {n a npn k ) \ ik 



(13) 

where a and j3 index neurons in the group cells and 
X is the connected part of the two-point correlation 
function, Xik = ( n i n k) — ( n i)( n k)- We solve for 
the fields at very small increments, AJ/J = 0.001, 
checking satisfaction of the constraints on the firing 
rates at each step. This perturbative approach is fast, 
but accumulates errors. To correct for the accumu- 
lated errors we perform local function minimization 
whenever the fractional error in the single cell rates 
exceeds 10~ 8 , and then return to the perturbative step- 
ping until the error bound is again reached. 

Once we create a model population response, 
we sum the spike counts across the full time window 
of the response to motion, and compute the correlation 
coefficients between counts in all pairs of cells in our 
model population The mean of these coefficients pro- 
vides an index for the overall strength of the correla- 
tions. Experimentally, for neurons in MT, the correla- 
tion coefficients are in the range from 0. 1 to 0.2 ' , 
which corresponds to J = 0.11 to 0. 16 in our models. 
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